Efficient iterative policy optimization
نویسنده
چکیده
We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.
منابع مشابه
An Efficient Heuristic Optimization Algorithm for a Two - Echelon ( R , Q ) Inventory System
This paper presents a two-echelon non-repairable spare parts inventory system that consists of one warehouse and m identical retailers and implements the reorder point, order quantity (R, Q) inventory policy. We formulate the policy decision problem in order to minimize the total annual inventory investment subject to average annual ordering frequency and expected number of backorder constraint...
متن کاملScalable and Fair Admission Control for On-Chip Nanophotonic Crossbars
Advances in CMOS-compatible photonic elements have made it plausible to exploit nanophotonic communications to overcome the limitations of traditional NoCs. Amongst various proposed nanophotonic architectures, optical crossbars have been shown to provide high performance in terms of bandwidth and latency. In general, optical crossbars provide a vast volume of network resources that are shared a...
متن کاملEnergy-Efficient Cognitive Radio Sensor Networks: Parametric and Convex Transformations
Designing energy-efficient cognitive radio sensor networks is important to intelligently use battery energy and to maximize the sensor network life. In this paper, the problem of determining the power allocation that maximizes the energy-efficiency of cognitive radio-based wireless sensor networks is formed as a constrained optimization problem, where the objective function is the ratio of netw...
متن کاملAn efficient improvement of the Newton method for solving nonconvex optimization problems
Newton method is one of the most famous numerical methods among the line search methods to minimize functions. It is well known that the search direction and step length play important roles in this class of methods to solve optimization problems. In this investigation, a new modification of the Newton method to solve unconstrained optimization problems is presented. The significant ...
متن کاملAn Iterative Heuristic Optimization Model for Multi-Echelon (R, Q) Inventory Systems
Large multi-echelon inventory systems usually consist of hundreds of thousands of stock keep units (SKU). Calculating inventory policies for each product is a computational burden that necessitates the need for more efficient policy setting techniques that reduce computational time and increases managerial convenience. The main objective of our research is to investigate the effect of segmentat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.08967 شماره
صفحات -
تاریخ انتشار 2016